Cross-validation for selecting a model selection procedure
نویسندگان
چکیده
While there are variousmodel selectionmethods, an unanswered but important question is how to select one of them for data at hand. The difficulty is due to that the targeted behaviors of the model selection procedures depend heavily on uncheckable or difficult-to-check assumptions on the data generating process. Fortunately, cross-validation (CV) provides a general tool to solve this problem. In this work, results are provided on how to apply CV to consistently choose the best method, yielding new insights and guidance for potentially vast amount of application. In addition, we address several seemingly widely spread misconceptions on CV. © 2015 Elsevier B.V. All rights reserved.
منابع مشابه
Asymptotic optimality of full cross-validation for selecting linear regression models
For the problem of model selection, full cross-validation has been proposed as alternative criterion to the traditional cross-validation, particularly in cases where the latter one is not well deened. To justify the use of the new proposal we show that under some conditions, both criteria share the same asymptotic optimality property when selecting among linear regression models.
متن کاملCross-validation pitfalls when selecting and assessing regression and classification models
BACKGROUND We address the problem of selecting and assessing classification and regression models using cross-validation. Current state-of-the-art methods can yield models with high variance, rendering them unsuitable for a number of practical applications including QSAR. In this paper we describe and evaluate best practices which improve reliability and increase confidence in selected models. ...
متن کاملLinear Model Selection by Cross-Validation
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected].. American Statistical Association is collaborating with JS...
متن کاملEstimator selection in the Gaussian setting
We consider the problem of estimating the mean f of a Gaussian vector Y with independent components of common unknown variance σ. Our estimation procedure is based on estimator selection. More precisely, we start with an arbitrary and possibly infinite collection F of estimators of f based on Y and, with the same data Y , aim at selecting an estimator among F with the smallest Euclidean risk. N...
متن کاملA general, prediction error-based criterion for selecting model complexity for high-dimensional survival models.
When fitting predictive survival models to high-dimensional data, an adequate criterion for selecting model complexity is needed to avoid overfitting. The complexity parameter is typically selected by the predictive partial log-likelihood (PLL) estimated via cross-validation. As an alternative criterion, we propose a relative version of the integrated prediction error curve (IPEC), which can be...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015